An Algorithm Combining Statistics-based and Rules-based for Chunk Identification of Chinese Sentences

نویسندگان

  • Rongbo Wang
  • Zheru Chi
چکیده

Natural language processing (NLP) is a very hot research domain. One important branch of it is sentence analysis, including Chinese sentence analysis. However, currently, no mature deep analysis theories and techniques are available. An alternative way is to perform shallow parsing on sentences which is very popular in the domain. The chunk identification is a fundamental task for shallow parsing. The purpose of this paper is to characterize a chunk boundary parsing algorithm, using a statistical method combining adjustment rules, which serves as a supplement to traditional statistics-based parsing methods. The experimental results show that the model works well on the small dataset. It will contribute to the sequent processes like chunk tagging and chunk collocation extraction under other topics etc.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Terminology of Combining the Sentences of Farsi Language with the Viterbi Algorithm and BI-GRAM Labeling

This paper, based on the Viterbi algorithm, selects the most likely combination of different wording from a variety of scenarios. In this regard, the Bi-gram and Unigram tags of each word, based on the letters forming the words, as well as the bigram and unigram labels After the breakdown into the composition or moment of transition from the decomposition to the combination obtained from th...

متن کامل

Chunk-Level Reordering of Source Language Sentences with Automatically Learned Rules for Statistical Machine Translation

In this paper, we describe a sourceside reordering method based on syntactic chunks for phrase-based statistical machine translation. First, we shallow parse the source language sentences. Then, reordering rules are automatically learned from source-side chunks and word alignments. During translation, the rules are used to generate a reordering lattice for each sentence. Experimental results ar...

متن کامل

Building A Chinese Text Summarizer with Phrasal Chunks and Domain Knowledge

This paper introduces a Chinese summarizier called ThemePicker. Though the system incorporates both statistical and text analysis models, the statistical model plays a major role during the automated process. In addition to word segmentation and proper names identification, phrasal chunk extraction and content density calculation are based on a semantic network pre-constructed for a chosen doma...

متن کامل

Automatic Rule Acquisition for Chinese Intra-chunk Relations

Multiword chunking is defined as a task to automatically analyze the external function and internal structure of the multiword chunk(MWC) in a sentence. To deal with this problem, we proposed a rule acquisition algorithm to automatically learn a chunk rule base, under the support of a large scale annotated corpus and a lexical knowledge base. We also proposed an expectation precision index to o...

متن کامل

An Optimal Approach to Local and Global Text Coherence Evaluation Combining Entity-based, Graph-based and Entropy-based Approaches

Text coherence evaluation becomes a vital and lovely task in Natural Language Processing subfields, such as text summarization, question answering, text generation and machine translation. Existing methods like entity-based and graph-based models are engaging with nouns and noun phrases change role in sequential sentences within short part of a text. They even have limitations in global coheren...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006